智能论文笔记

Biomedical image analysis competitions: The state of current participation practice

Matthias Eisenmann , Annika Reinke , Vivienn Weru , Minu Dietlinde Tizabi , Fabian Isensee , Tim J. Adler , Patrick Godau , Veronika Cheplygina , Michal Kozubek , Sharib Ali

分类：计算机视觉 | 机器学习

2022-12-16

The number of international benchmarking competitions is steadily increasing in various fields of machine learning (ML) research and practice. So far, however, little is known about the common practice as well as bottlenecks faced by the community in tackling the research questions posed. To shed light on the status quo of algorithm development in the specific field of biomedical imaging analysis, we designed an international survey that was issued to all participants of challenges conducted in conjunction with the IEEE ISBI 2021 and MICCAI 2021 conferences (80 competitions in total). The survey covered participants' expertise and working environments, their chosen strategies, as well as algorithm characteristics. A median of 72% challenge participants took part in the survey. According to our results, knowledge exchange was the primary incentive (70%) for participation, while the reception of prize money played only a minor role (16%). While a median of 80 working hours was spent on method development, a large portion of participants stated that they did not have enough time for method development (32%). 25% perceived the infrastructure to be a bottleneck. Overall, 94% of all solutions were deep learning-based. Of these, 84% were based on standard architectures. 43% of the respondents reported that the data samples (e.g., images) were too large to be processed at once. This was most commonly addressed by patch-based training (69%), downsampling (37%), and solving 3D analysis tasks as a series of 2D tasks. K-fold cross-validation on the training set was performed by only 37% of the participants and only 50% of the participants performed ensembling based on multiple identical models (61%) or heterogeneous models (39%). 48% of the respondents applied postprocessing steps.

translated by 谷歌翻译

Cause-and-Effect Analysis of ADAS: A Comparison Study between Literature Review and Complaint Data

Jackie Ayoub , Zifei Wang , Meitang Li , Huizhong Guo , Rini Sherony , Shan Bao , Feng Zhou

分类：自然语言处理

2022-07-30

高级驾驶员辅助系统（ADA）旨在提高车辆安全性。但是，如果不了解当前ADA及其可能的解决方案的原因和局限性，就很难获得此类收益。这项研究1）通过文献综述研究了ADA的局限性和解决方案，2）通过使用自然语言处理模型来确定ADA通过消费者投诉的原因和影响，3）比较了两者之间的主要差异。这两条研究线确定了类似的ADA原因类别，包括人为因素，环境因素和车辆因素。但是，学术研究更多地集中在ADA问题的人为因素上，并提出了高级算法来减轻此类问题，而驾驶员抱怨ADAS失败的更多车辆因素，这导致了最大的后果。这两个来源的发现倾向于相互补充，并为未来的改善ADA提供了重要意义。

translated by 谷歌翻译

A Probabilistic Framework for Estimating the Risk of Pedestrian-Vehicle Conflicts at Intersections

Pei Li , Huizhong Guo , Shan Bao , Arpan Kusari

分类：机器学习

2022-07-28

由于行人涉及的撞车事故的数量增加，行人安全已成为各种研究的重要研究主题。为了主动评估行人安全，替代安全措施（SSM）已被广泛用于基于交通冲突的研究中，因为它们不需要历史崩溃作为输入。但是，大多数现有的SSM是根据道路使用者保持恒定速度和方向的假设而开发的。基于此假设的风险估计较不稳定，更可能被夸大，并且无法捕获驾驶员的回避操作。考虑到现有SSM之间的局限性，本研究提出了一个概率框架，用于估计十字路口处行人车的风险。提出的框架通过使用高斯过程回归预测轨迹，并通过随机森林模型来解释不同可能的驱动器操纵，从而放大了恒定速度的限制。在十字路口收集的现实世界激光雷达数据用于评估所提出的框架的性能。新开发的框架能够识别所有行人车的冲突。与收集时间相比，提议的框架提供了更稳定的风险估计，并捕获了汽车的回避操作。此外，提议的框架不需要昂贵的计算资源，这使其成为交叉点实时主动行人安全解决方案的理想选择。

translated by 谷歌翻译

Fast Composite Optimization and Statistical Recovery in Federated Learning

Yajie Bao , Michael Crawshaw , Shan Luo , Mingrui Liu

分类：机器学习 | (统计)机器学习

2022-07-17

作为一个普遍的分布式学习范式，联邦学习（FL）训练了大量通信的大量设备的全球模型。本文研究了FL设置中的一类复合优化和统计恢复问题，其损失函数由数据依赖的平滑损耗和非平滑正常器组成。示例包括使用套索的稀疏线性回归，使用核标准正则化等等的低级矩阵恢复等。在现有文献中，联合复合优化算法仅从优化的角度设计，而无需任何统计保证。此外，他们不考虑在统计恢复问题中常用（受限）强凸度。从优化和统计角度来看，我们都会推进此问题的前沿。从优化的前期，我们提出了一种名为\ textit {快速联合双平均}的新算法，用于强烈凸出和平滑损失，并在复合设置中建立最新的迭代和通信复杂性。特别是，我们证明它具有快速的速度，线性加速和减少的沟通回合。从统计前期开始，对于受限制的强烈凸出和平滑损失，我们设计了另一种算法，即\ textIt {多阶段联合双重平均}，并证明了与线性加速绑定到最佳统计精度的高概率复杂性。合成数据和真实数据的实验表明，我们的方法的性能优于其他基线。据我们所知，这是为FL中复合问题提供快速优化算法和统计恢复保证的第一项工作。

translated by 谷歌翻译

Deep Learning Technique for Human Parsing: A Survey and Outlook

Lu Yang , Wenhe Jia , Shan Li , Qing Song

分类：计算机视觉

2023-01-01

Human parsing aims to partition humans in image or video into multiple pixel-level semantic parts. In the last decade, it has gained significantly increased interest in the computer vision community and has been utilized in a broad range of practical applications, from security monitoring, to social media, to visual special effects, just to name a few. Although deep learning-based human parsing solutions have made remarkable achievements, many important concepts, existing challenges, and potential research directions are still confusing. In this survey, we comprehensively review three core sub-tasks: single human parsing, multiple human parsing, and video human parsing, by introducing their respective task settings, background concepts, relevant problems and applications, representative literature, and datasets. We also present quantitative performance comparisons of the reviewed methods on benchmark datasets. Additionally, to promote sustainable development of the community, we put forward a transformer-based human parsing framework, providing a high-performance baseline for follow-up research through universal, concise, and extensible solutions. Finally, we point out a set of under-investigated open issues in this field and suggest new directions for future study. We also provide a regularly updated project page, to continuously track recent developments in this fast-advancing field: https://github.com/soeaver/awesome-human-parsing.

translated by 谷歌翻译

Dream3D: Zero-Shot Text-to-3D Synthesis Using 3D Shape Prior and Text-to-Image Diffusion Models

Jiale Xu , Xintao Wang , Weihao Cheng , Yan-Pei Cao , Ying Shan , Xiaohu Qie , Shenghua Gao

分类：计算机视觉

2022-12-28

Recent CLIP-guided 3D optimization methods, e.g., DreamFields and PureCLIPNeRF achieve great success in zero-shot text-guided 3D synthesis. However, due to the scratch training and random initialization without any prior knowledge, these methods usually fail to generate accurate and faithful 3D structures that conform to the corresponding text. In this paper, we make the first attempt to introduce the explicit 3D shape prior to CLIP-guided 3D optimization methods. Specifically, we first generate a high-quality 3D shape from input texts in the text-to-shape stage as the 3D shape prior. We then utilize it as the initialization of a neural radiance field and then optimize it with the full prompt. For the text-to-shape generation, we present a simple yet effective approach that directly bridges the text and image modalities with a powerful text-to-image diffusion model. To narrow the style domain gap between images synthesized by the text-to-image model and shape renderings used to train the image-to-shape generator, we further propose to jointly optimize a learnable text prompt and fine-tune the text-to-image diffusion model for rendering-style image generation. Our method, namely, Dream3D, is capable of generating imaginative 3D content with better visual quality and shape accuracy than state-of-the-art methods.

translated by 谷歌翻译

Robust computation of optimal transport by $β$-potential regularization

Shintaro Nakamura , Han Bao , Masashi Sugiyama

分类：机器学习 | 人工智能

2022-12-26

Optimal transport (OT) has become a widely used tool in the machine learning field to measure the discrepancy between probability distributions. For instance, OT is a popular loss function that quantifies the discrepancy between an empirical distribution and a parametric model. Recently, an entropic penalty term and the celebrated Sinkhorn algorithm have been commonly used to approximate the original OT in a computationally efficient way. However, since the Sinkhorn algorithm runs a projection associated with the Kullback-Leibler divergence, it is often vulnerable to outliers. To overcome this problem, we propose regularizing OT with the \beta-potential term associated with the so-called $\beta$-divergence, which was developed in robust statistics. Our theoretical analysis reveals that the $\beta$-potential can prevent the mass from being transported to outliers. We experimentally demonstrate that the transport matrix computed with our algorithm helps estimate a probability distribution robustly even in the presence of outliers. In addition, our proposed method can successfully detect outliers from a contaminated dataset

translated by 谷歌翻译

Federated PCA on Grassmann Manifold for Anomaly Detection in IoT Networks

Tung-Anh Nguyen , Jiayu He , Long Tan Le , Wei Bao , Nguyen H. Tran

分类：机器学习

2022-12-23

In the era of Internet of Things (IoT), network-wide anomaly detection is a crucial part of monitoring IoT networks due to the inherent security vulnerabilities of most IoT devices. Principal Components Analysis (PCA) has been proposed to separate network traffics into two disjoint subspaces corresponding to normal and malicious behaviors for anomaly detection. However, the privacy concerns and limitations of devices' computing resources compromise the practical effectiveness of PCA. We propose a federated PCA-based Grassmannian optimization framework that coordinates IoT devices to aggregate a joint profile of normal network behaviors for anomaly detection. First, we introduce a privacy-preserving federated PCA framework to simultaneously capture the profile of various IoT devices' traffic. Then, we investigate the alternating direction method of multipliers gradient-based learning on the Grassmann manifold to guarantee fast training and the absence of detecting latency using limited computational resources. Empirical results on the NSL-KDD dataset demonstrate that our method outperforms baseline approaches. Finally, we show that the Grassmann manifold algorithm is highly adapted for IoT anomaly detection, which permits drastically reducing the analysis time of the system. To the best of our knowledge, this is the first federated PCA algorithm for anomaly detection meeting the requirements of IoT networks.

translated by 谷歌翻译

Tune-A-Video: One-Shot Tuning of Image Diffusion Models for Text-to-Video Generation

Jay Zhangjie Wu , Yixiao Ge , Xintao Wang , Weixian Lei , Yuchao Gu , Wynne Hsu , Ying Shan , Xiaohu Qie , Mike Zheng Shou

分类：计算机视觉

2022-12-22

To reproduce the success of text-to-image (T2I) generation, recent works in text-to-video (T2V) generation employ large-scale text-video dataset for fine-tuning. However, such paradigm is computationally expensive. Humans have the amazing ability to learn new visual concepts from just one single exemplar. We hereby study a new T2V generation problem$\unicode{x2014}$One-Shot Video Generation, where only a single text-video pair is presented for training an open-domain T2V generator. Intuitively, we propose to adapt the T2I diffusion model pretrained on massive image data for T2V generation. We make two key observations: 1) T2I models are able to generate images that align well with the verb terms; 2) extending T2I models to generate multiple images concurrently exhibits surprisingly good content consistency. To further learn continuous motion, we propose Tune-A-Video with a tailored Sparse-Causal Attention, which generates videos from text prompts via an efficient one-shot tuning of pretrained T2I diffusion models. Tune-A-Video is capable of producing temporally-coherent videos over various applications such as change of subject or background, attribute editing, style transfer, demonstrating the versatility and effectiveness of our method.

translated by 谷歌翻译

EIT: Enhanced Interactive Transformer

Tong Zheng , Bei Li , Huiwen Bao , Tong Xiao , Jingbo Zhu

分类：自然语言处理

2022-12-20

In this paper, we propose a novel architecture, the Enhanced Interactive Transformer (EIT), to address the issue of head degradation in self-attention mechanisms. Our approach replaces the traditional multi-head self-attention mechanism with the Enhanced Multi-Head Attention (EMHA) mechanism, which relaxes the one-to-one mapping constraint among queries and keys, allowing each query to attend to multiple keys. Furthermore, we introduce two interaction models, Inner-Subspace Interaction and Cross-Subspace Interaction, to fully utilize the many-to-many mapping capabilities of EMHA. Extensive experiments on a wide range of tasks (e.g. machine translation, abstractive summarization, grammar correction, language modelling and brain disease automatic diagnosis) show its superiority with a very modest increase in model size.

translated by 谷歌翻译